State Aggregation for Distributed Value Iteration in Dynamic Programming
نویسندگان
چکیده
We propose a distributed algorithm to solve dynamic programming problem with multiple agents, where each agent has only partial knowledge of the state transition probabilities and costs. provide consensus proofs for presented derive error bounds obtained value function respect what is considered as "true solution" from conventional iteration. To minimize communication overhead between costs are aggregated shared agents when updated expected influence solution other significantly. demonstrate efficacy proposed aggregation method large-scale urban traffic routing problem. Individual compute fastest route common access point share local congestion information allowing fully minimal agents.
منابع مشابه
Value Iteration with Options and State Aggregation
This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and indeed it is only after one combines the two that the full benefit of each is realized. We introduce a hierarchical value iteration algorithm where we first ...
متن کاملPerformance Loss Bounds for Approximate Value Iteration with State Aggregation
We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies ...
متن کاملUnifying Value Iteration, Advantage Learning, and Dynamic Policy Programming
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iter...
متن کاملAggregation in Stochastic Dynamic Programming
We present a general aggregation method applicable to all finite-horizon Markov decision problems. States of the MDP are aggregated into macro-states based on a pre-selected collection of “distinguished” states which serve as entry points into macro-states. The resulting macro-problem is also an MDP, whose solution approximates an optimal solution to the original problem. The aggregation scheme...
متن کاملA New Value Iteration Method for the Average Cost Dynamic Programming Problem∗
We propose a new value iteration method for the classical average cost Markovian decision problem, under the assumption that all stationary policies are unichain and that, furthermore, there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Control Systems Letters
سال: 2023
ISSN: ['2475-1456']
DOI: https://doi.org/10.1109/lcsys.2023.3285655